International Journal of Data Science and Big Data Analytics
|
Volume 4, Issue 2, November 2024 | |
Research PaperOpenAccess | |
Comparing Machine Learning Algorithms for Breast Cancer Diagnosis: Wisconsin Diagnostic Dataset Analysis |
|
1American International University-Bangladesh (AIUB), Dhaka 1229, Bangladesh. E-mail: arjunkumarbosu@gmail.com
*Corresponding Author | |
Int.J.Data.Sci. & Big Data Anal. 4(2) (2024) 1-11, DOI: https://doi.org/10.51483/IJDSBDA.4.2.2024.1-11 | |
Received: 08/07/2024|Accepted: 19/10/2024|Published: 05/11/2024 |
Due to the high incidence and death rate associated with this disease, accurate diagnostic instruments are of urgent need in the fight against this cancer. In this work, seven machine learning algorithms are investigated on a benchmark dataset Wisconsin Breast Cancer. The machine learning algorithms investigated in this study include Random Forest, K-Nearest Neighbors, Support Vector Machine, Naive Bayes, and Logistic Regression. The impact it has wrought around the world has been immense; hence, its diagnosis needs to be truly accurate. Among the algorithms, the performance evaluation uses confusion matrices, ROC curves, and feature analysis for four important metrics, namely: accuracy, precision, recall, and F1-score. Logistic regression yields the best performance with an accuracy of nearly 97.37% and a balanced approximately 97.90% F1 score. With its excellent recall rate of approximately 98.59%, it is very good at detecting real positives. Random Forest is a bit less accurate with precision than Logistic Regression but still is in second place with its accuracy score of about 96.49%. SVM showed quite a conservative approach, with high precision values of about 97.14%, while the accuracy of it is about 95.61%. In KNN and Decision Trees, this rate is around 94.74%. Very remarkable accuracy is also shown by XGBoost and Naive Bayes: approximately 95.61% and 96.49%, respectively. This study emphasizes considering the trade-offs in the metrics and states the promise of state-of-the-art techniques like machine learning and ensemble models for better predictive accuracy in the detection of breast cancer to improve patient outcomes.
Keywords: Breast cancer, Wisconsin dataset, Diagnostic models, Machine learning, Disease
Full text | Download |
Copyright © SvedbergOpen. All rights reserved